Algorithms for data streams

نویسندگان

  • CAMIL DEMETRESCU
  • IRENE FINOCCHI
چکیده

Efficient processing over massive data sets has taken an increased importance in the last few decades due to the growing availability of large volumes of data in a variety of applications in computational sciences. In particular, monitoring huge and rapidly changing streams of data that arrive online has emerged as an important data management problem: Relevant applications include analyzing network traffic, online auctions, transaction logs, telephone call records, automated bank machine operations, and atmospheric and astronomical events. For these reasons, the streaming model has recently received a lot of attention. This model differs from computation over traditional stored data sets since algorithms must process their input by making one or a small number of passes over it, using only a limited amount of working memory. The streaming model applies to settings where the size of the input far exceeds the size of the main memory available and the only feasible access to the data is by making one or more passes over it. Typical streaming algorithms use space at most polylogarithmic in the length of the input stream and must have fast update and query times. Using sublinear space motivates the design for summary data structures with small memory footprints, also known as synopses [34]. Queries are answered using information provided by these synopses, and it may be impossible to produce an exact answer. The challenge is thus to produce high quality approximate answers, that is, answers with confidence bounds on the possible error: Accuracy guarantees are typically made in terms of a pair of user-specified parameters, ε and δ, meaning that the error in answering a query is within a factor of 1 + ε of the true answer with probability at least 1 − δ. The space and update time will depend on these parameters and the goal is to limit this dependence as much as possible. Major progress has been achieved in the last 10 years in the design of streaming algorithms for several fundamental data sketching and statistics problems, for which several different synopses have been proposed. Examples include number of distinct

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Density Micro-Clustering Algorithms on Data Streams: A Review

Data streams are massive, fast-changing, and infinite. Applications of data streams can vary from critical scientific and astronomical applications to important business and financial ones. They need algorithms to make a single pass with limited time and memory. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non-stopping data streams....

متن کامل

An Overview of Algorithms Used for Mining Frequent Patterns in Data Streams

Data streams are an ordered sequence of items that arrives in timely order. It is impossible to store the data in which item arrives. To apply data mining algorithm directly to streams instead of storing them before in a database. Real time surveillances system, telecommunication system, sensor network, financial applications, transactional data are some of the examples of the data stream syste...

متن کامل

Single-Pass Algorithms for Mining Frequency Change Patterns with Limited Space in Evolving Append-Only and Dynamic Transaction Data Streams

In this paper, we propose an online single-pass algorithm MFC-append (Mining Frequency Change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called ChangeSketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modifie...

متن کامل

A Model For The Residence Time Distribution and Holdup Measurement in a Two Impinging Streams Cyclone Reactor/Contactor in Solid-Liquid Systems

In this paper a two impinging streams cyclone contacting system suitable for handling of solid-liquid systems has been studied. Certain pertinent parameters such as: solid holdup, mean residence time and Residence Time Distribution (RTD) of solid particles have been investigated. A stochastic model based on Markov chains processes has been applied which describe the behavior of solid partic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006